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The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question 
of whether there are any general patterns that determine the ingredient combinations used in food today or 
principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor 
compounds shared by culinary ingredients. Western cuisines show a tendency to use ingredient pairs that share 
many flavor compounds, supporting the so-called food pairing hypothesis. By contrast, East Asian cuisines tend 
to avoid compound sharing ingredients. Given the increasing availability of information on food preparation, 
our data-driven investigation opens new avenues towards a systematic understanding of culinary practice. 

As omnivores, humans have historically faced the difficult task of identifying and gathering food that 
satisfies nutritional needs while avoiding foodborne illnesses 1 . This process has contributed to the current 
diet of humans, which is influenced by factors ranging from an evolved preference for sugar and fat to 
palatability, nutritional value, culture, ease of production, and climate 19 . The relatively small number of recipes 
in use (~10 6 , e.g. http://cookpad.com) compared to the enormous number of potential recipes (>10 15 , see 
Supplementary Information Sec S1.2), together with the frequent recurrence of particular combinations in 
various regional cuisines, indicates that we are exploiting but a tiny fraction of the potential combinations. 
Although this pattern itself can be explained by a simple evolutionary model 10 or data-driven approaches 11 , a 
fundamental question still remains: are there any quantifiable and reproducible principles behind our choice of 
certain ingredient combinations and avoidance of others? 

Although many factors such as colors, texture, temperature, and sound play an important role in food 
sensation 1215 , palatability is largely determined by flavor, representing a group of sensations including odors 
(due to molecules that can bind olfactory receptors), tastes (due to molecules that stimulate taste buds), and 
freshness or pungency (trigeminal senses) 16 . Therefore, the flavor compound (chemical) profile of the culinary 
ingredients is a natural starting point for a systematic search for principles that might underlie our choice of 
acceptable ingredient combinations. 

A hypothesis, which over the past decade has received attention among some chefs and food scientists, states that 
ingredients sharing flavor compounds are more likely to taste well together than ingredients that do not 17 (also see 
http://www.foodpairing.com). This food pairing hypothesis has been used to search for novel ingredient combina- 
tions and has prompted, for example, some contemporary restaurants to combine white chocolate and caviar, as they 
share trimethylamine and other flavor compounds, or chocolate and blue cheese that share at least 73 flavor 
compounds. As we search for evidence supporting (or refuting) any 'rules' that may underlie our recipes, we must 
bear in mind that the scientific analysis of any art, including the art of cooking, is unlikely to be capable of explaining 
every aspect of the artistic creativity involved. Furthermore, there are many ingredients whose main role in a recipe 
may not be only flavoring but something else as well (e.g. eggs' role to ensure mechanical stability or paprika's role to 
add vivid colors). Finally, the flavor of a dish owes as much to the mode of preparation as to the choice of particular 
ingredients 1218-19 . However, our hypothesis is that, given the large number of recipes we use in our analysis (56,498), 
such factors can be systematically filtered out, allowing for the discovery of patterns that may transcend specific 
dishes or ingredients. 

Here we introduce a network-based approach to explore the impact of flavor compounds on ingredient 
combinations. Efforts by food chemists to identify the flavor compounds contained in most culinary ingredients 
allows us to link each ingredient to 51 flavor compounds on average 20 \ We build a bipartite network 21 26 

'While finalizing this manuscript, an updated edition (6th Ed.) of Fenaroli's handbookof flavor ingredientshas been released. 
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Figure 1 | Flavor network. (A) The ingredients contained in two recipes (left column), together with the flavor compounds that are known to be present 
in the ingredients (right column). Each flavor compound is linked to the ingredients that contain it, forming a bipartite network. Some compounds 
(shown in boldface) are shared by multiple ingredients. (B) If we project the ingredient-compound bipartite network into the ingredient space, we obtain 
the flavor network, whose nodes are ingredients, linked if they share at least one flavor compound. The thickness of links represents the number of flavor 
compounds two ingredients share and the size of each circle corresponds to the prevalence of the ingredients in recipes. (C) The distribution of recipe size, 
capturing the number of ingredients per recipe, across the five cuisines explored in our study. (D) The frequency-rank plot of ingredients across the five 
cuisines show an approximately invariant distribution across cuisines. 



consisting of two different types of nodes: (i) 381 ingredients used in 
recipes throughout the world, and (ii) 1,021 flavor compounds that 
are known to contribute to the flavor of each of these ingredients 
(Fig. 1 A). A projection of this bipartite network is the flavor network 
in which two nodes (ingredients) are connected if they share at least 
one flavor compound (Fig. IB). The weight of each link represents 
the number of shared flavor compounds, turning the flavor network 
into a weighted network 27,22,23 . While the compound concentration in 
each ingredient and the detection threshold of each compound 
should ideally be taken into account, the lack of systematic data 
prevents us from exploring their impact (see Sec SI. 1.2 on data 
limitations). 

Since several flavor compounds are shared by a large number of 
ingredients, the resulting flavor network is too dense for direct visu- 
alization (average degree (fc)~214). We therefore use a backbone 
extraction method 28,29 to identify the statistically significant links for 
each ingredient given the sum of weights characterizing the particu- 
lar node (Fig. 2), see SI for details). Not surprisingly, each module in 
the network corresponds to a distinct food class such as meats (red) 



or fruits (yellow). The links between modules inform us of the flavor 
compounds that hold different classes of foods together. For 
instance, fruits and dairy products are close to alcoholic drinks, 
and mushrooms appear isolated, as they share a statistically signifi- 
cant number of flavor compounds only with other mushrooms. 

The flavor network allows us to reformulate the food pairing 
hypothesis as a topological property: do we more frequently use 
ingredient pairs that are strongly linked in the flavor network or 
do we avoid them? To test this hypothesis we need data on ingredient 
combinations preferred by humans, information readily available in 
the current body of recipes. For generality, we used 56,498 recipes 
provided by two American repositories (epicurious.com and allreci- 
pes.com) and to avoid a distinctly Western interpretation of the 
world's cuisine, we also used a Korean repository (menupan.com). 
The recipes are grouped into geographically distinct cuisines (North 
American,Western European, Southern European, Latin American, 
and East Asian; see Fig. 1 and Table S2). The average number of 
ingredients used in a recipe is around eight, and the overall distri- 
bution is bounded (Fig. 1C), indicating that recipes with a very large 
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Figure 2 | The backbone of the flavor network. Each node denotes an ingredient, the node color indicates food category, and node size reflects the 
ingredient prevalence in recipes. Two ingredients are connected if they share a significant number of flavor compounds, link thickness representing the 
number of shared compounds between the two ingredients. Adjacent links are bundled to reduce the clutter. Note that the map shows only the statistically 
significant links, as identified by the algorithm of Refs. 28,29 for p- value 0.04. A drawing of the full network is too dense to be informative. We use, however, 
the full network in our subsequent measurements. 



or very small number of ingredients are rare. By contrast, the popu- 
larity of specific ingredients varies over four orders of magnitude, 
documenting huge differences in how frequently various ingredients 
are used in recipes (Fig. ID), as observed in 10 . For example, jasmine 
tea, Jamaican rum, and 14 other ingredients are each found in only a 
single recipe (see SI SI. 2), but egg appears in as many as 20,951, more 
than one third of all recipes. 

Results 

Figure 3D indicates that North American and Western European 
cuisines exhibit a statistically significant tendency towards recipes 
whose ingredients share flavor compounds. By contrast, East Asian 
and Southern European cuisines avoid recipes whose ingredients 
share flavor compounds (see Fig. 3D for the Z-score, capturing the 
statistical significance of AN S ). The systematic difference between the 
East Asian and the North American recipes is particularly clear if we 
inspect the P(N s rand ) distribution of the randomized recipe dataset, 
compared to the observed number of shared compunds character- 
izing the two cuisines, N s . This distribution reveals that North 
American dishes use far more compound-sharing pairs than 
expected by chance (Fig. 3E), and the East Asian dishes far fewer 
(Fig. 3F). Finally, we generalize the food pairing hypothesis by 
exploring if ingredient pairs sharing more compounds are more likely 
to be used in specific cuisines. The results largely correlate with our 
earlier observations: in North American recipes, the more compounds 
are shared by two ingredients, the more likely they appear in recipes. 
By contrast, in East Asian cuisine the more flavor compounds two 



ingredients share, the less likely they are used together (Fig. 3G and 
3H; see SI for details and results on other cuisines). 

What is the mechanism responsible for these differences? That is, 
does Fig. 3C through H imply that all recipes aim to pair ingredients 
together that share (North America) or do not share (East Asia) 
flavor compounds, or could we identify some compounds respons- 
ible for the bulk of the observed effect? We therefore measured the 
contribution (see Methods) of each ingredient to the shared com- 
pound effect in a given cuisine c, quantifying to what degree its 
presence affects the magnitude of AN S . 

In Fig. 3IJ we show as a scatter plot ~/i (horizontal axis) and the 
frequency/) for each ingredient in North American and East Asian 
cuisines. The vast majority of the ingredients lie on the ~/i = 0 axis, 
indicating that their contribution to AN S is negligible. Yet, we observe 
a few frequently used outliers, which tend to be in the positive X; 
region for North American cuisine, and lie predominantly in the 
negative region for East Asian cuisine. This suggests that the food 
pairing effect is due to a few outliers that are frequently used in a 
particular cuisine, e.g. milk, butter, cocoa, vanilla, cream, and egg in 
the North America, and beef, ginger, pork, cayenne, chicken, and 
onion in East Asia. Support for the definitive role of these ingredients 
is provided in Fig. 3K,L where we removed the ingredients in order 
of their positive (or negative) contributions to AN S in the North 
American (or East Asian) cuisine, finding that the z-score, which 
measures the significance of the shared compound hypothesis, drops 
below two after the removal of only 13 (5) ingredients from North 
American (or East Asian) cuisine (see SI S2.2.2). Note, however, that 
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Figure 3 | Testing the food pairing hypothesis. Schematic illustration of two ingredient pairs, the first sharing many more (A) and the second much fewer 
(B) compounds than expected if the flavor compounds were distributed randomly. (C,D) To test the validity of the food pairing hypothesis, we construct 
10,000 random recipes and calculate AN S . We find that ingredient pairs in North American cuisines tend to share more compounds while East Asian 
cuisines tend to share fewer compounds than expected in a random recipe dataset. (E,F) The distributions P(N S ) for 10,000 randomized recipe datasets 
compared with the real values for East Asian and North American cuisine. Both cuisines exhibit significant p- values, as estimated using a z-test. (G,H) We 
enumerate every possible ingredient pair in each cuisine and show the fraction of pairs in recipes as a function of the number of shared compounds. To 
reduce noise, we only used data points calculated from more than 5 pairs. The p-values are calculated using a t-test. North American cuisine is biased 
towards pairs with more shared compounds while East Asian shows the opposite trend (see SI for details and results for other cuisines). Note that we used 
the full network, not the backbone shown in Fig. 2 to obtain these results. (I,J) The contribution and frequency of use for each ingredient in North American 
and East Asian cuisine. The size of the circles represents the relative prevalence p\ . North American and East Asian cuisine shows the opposite trends. (K,L) 
If we remove the highly contributing ingredients sequentially (from the largest contribution in North American cuisine and from the smallest contribution 
in East Asian cuisine), the shared compounds effect quickly vanishes when we removed five (East Asian) to fifteen (North American) ingredients. 



these ingredients play a disproportionate role in the cuisine under 
consideration — for example, the 13 key ingredients contributing 
to the shared compound effect in North American cuisine appear 
in 74.4% of all recipes. 

According to an empirical view known as "the flavor principle" 30 , 
the differences between regional cuisines can be reduced to a few key 
ingredients with specific flavors: adding soy sauce to a dish almost 
automatically gives it an oriental taste because Asians use soy sauce 
widely in their food and other ethnic groups do not; by contrast 
paprika, onion, and lard is a signature of Hungarian cuisine. Can 
we systematically identify the ingredient combinations responsible 
for the taste palette of a regional cuisine? To answer this question, we 
measure the authenticity of each ingredient (pf), ingredient pair (p£), 
and ingredient triplet (p^) (see Methods). In Fig. 4 we organize the 
six most authentic single ingredients, ingredient pairs and triplets for 
North American and East Asian cuisines in a flavor pyramid. The 
rather different ingredient classes (as reflected by their color) in the 
two pyramids capture the differences between the two cuisines: 
North American food heavily relies on dairy products, eggs and 
wheat; by contrast, East Asian cuisine is dominated by plant deriva- 
tives like soy sauce, sesame oil, and rice and ginger. Finally, the two 
pyramids also illustrate the different affinities of the two regional 



cuisines towards food pairs with shared compounds. The most 
authentic ingredient pairs and triplets in the North American cuisine 
share multiple flavor compounds, indicated by black links, but 
such compound-sharing links are rare among the most authentic 
combinations in East Asian cuisine. 

The reliance of regional cuisines on a few authentic ingredient 
combinations allows us to explore the ingredient-based relationship 
(similarity or dissimilarity) between various regional cuisines. For 
this we selected the six most authentic ingredients and ingredient 
pairs in each regional cuisine (i.e. those shown in Fig. 4A,B), gen- 
erating a diagram that illustrates the ingredients shared by various 
cuisines, as well as singling out those that are unique to a particular 
region (Fig. 4C). We once again find a close relationship between 
North American and Western European cuisines and observe that 
when it comes to its signature ingredient combinations Southern 
European cuisine is much closer to Latin American than Western 
European cuisine (Fig. 4C). 

Discussion 

Our work highlights the limitations of the recipe data sets currently 
available, and more generally of the systematic analysis of food 
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Figure 4 | Flavor principles. (A,B) Flavor pyramids for North American and East Asian cuisines. Each flavor pyramid shows the six most authentic 
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preparation data. By comparing two editions of the same dataset with 
significantly different coverage, we can show that our results are 
robust against data incompleteness (see SI SI. 1.2). Yet, better com- 
pound databases, mitigating the incompleteness and the potential 



biases of the current data, could significantly improve our under- 
standing of food. There is inherent ambiguity in the definition of a 
particular regional or ethnic cuisine. However, as discussed in SI S1.2, 
the correlation between different datasets, representing two distinct 
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perspectives on food (American and Korean), indicates that humans 
with different ethnic background have a rather consistent view on the 
composition of various regional cuisines. 

Recent work by Kinouchi et al. 10 observed that the frequency- rank 
plots of ingredients are invariant across four different cuisines, 
exhibiting a shape that can be well described by a Zipf-Mandelbrot 
curve. Based on this observation, they model the evolution of re- 
cipes by assuming a copy-mutate process, leading to a very similar 
frequency-rank curve. The copy-mutate model provides an explana- 
tion for how an ingredient becomes a staple ingredient of a cuisine: 
namely, having a high value or being a founder ingredient. The model 
assigns each ingredient a random fitness value, which represents the 
ingredient's nutritional value, availability, flavor, etc. For example, it 
has been suggested that some ingredient are selected because of their 
antimicrobial properties 6,7 . The mutation phase of the model replaces 
less fit ingredients with fitter ones. Meanwhile, the copy mechanism 
keeps copying the founder ingredients — ingredients in the early 
recipes — and makes them abundant in the recipes regardless of their 
fitness value. 

It is worthwhile to discuss the similarities and differences between 
the quantities we measured and the concepts of fitness and founders. 
First of all, prevalence (Pf) and authenticity (pf) are empirically 
measured values while fitness is an intrinsic hidden variable. 
Among the list of highly prevalent ingredients we indeed find old 
ingredients — founders — that have been used in the same geographic 
region for thousands of years. At the same time, there are relatively 
new ingredients such as tomatoes, potatoes, and peppers that were 
introduced to Europe and Asia just a few hundred years ago. These 
new, but prevalent ingredients can be considered to have high fitness 
values. If an ingredient has a high level of authenticity, then it is 
prevalent in a cuisine while not so prevalent in all other cuisines. 

Indeed, each culture has developed their own authentic ingredi- 
ents. It may indicate that fitness can vary greatly across cuisines or 
that the stochasticity of recipe evolution diverge the recipes in dif- 
ferent regions into completely different sets. More historical invest- 
igation will help us to estimate the fitness of ingredients and assess 
why we use the particular ingredients we currently do. The higher 
order fitness value suggested in Kinouchi et al. is very close to our 
concept of food pairing affinity. 

Another difference in our results is the number of ingredients in 
recipes. Kinouchi et al. reported that the average number of ingre- 
dients per recipe varies across different cookbooks. While we also 
observed variation in the number of ingredients per recipe, the pat- 
terns we found were not consistent with those found by Kinouchi 
et al. For instance, the French cookbook has more ingredients per 
recipe than a Brazillian one, but in our dataset we find the opposite 
result. We believe that a cookbook cannot represent a whole cuisine, 
and that cookbooks with more sophisticated recipes will tend to have 
more ingredients per recipe than cookbooks with everyday recipes. 
As more complete datasets become available, sharper conclusions 
can be drawn regarding the size variation between cuisines. 

Our contribution in this context is a study of the role that flavour 
compounds play in determining these fitness values. One possible 
interpretation of our results is that shared flavor compounds re- 
present one of several contributions to fitness value, and that, while 
shared compounds clearly play a significant role in some cuisines, 
other contributions may play a more dominant role in other cuisines. 
The fact that recipes rely on ingredients not only for flavor but also to 
provide the final textures and overall structure of a given dish pro- 
vides support for the idea that fitness values depend on a multitude of 
ingredient characteristics besides their flavor profile. 

In summary, our network-based investigation identifies a series 
of statistically significant patterns that characterize the way humans 
choose the ingredients they combine in their food. These patterns 
manifest themselves to varying degree in different geographic regions: 
while North American and Western European dishes tend to combine 



ingredients that share flavor compounds, East Asian cuisine avoids 
them. More generally this work provides an example of how the data- 
driven network analysis methods that have transformed biology and 
the social sciences in recent years can yield new insights in other areas, 
such as food science. 

Methods 

Shared compounds. To test the hypothesis that the choice of ingredients is driven by 
an appreciation for ingredient pairs that share flavor compounds (i.e. those linked in 
Fig. 2), we measured the mean number of shared compounds in each recipe, N s , 
comparing it with Nf™ obtained for a randomly constructed reference recipe dataset. 
For a recipe R that contains n R different ingredients, where each ingredient i has a set 
of flavor compounds Q, the mean number of shared compounds 

N S (R)= Z J „ E l c ^Q| (1) 



n R (n R -l) 



is zero if none of the ingredient pairs in the recipe share any flavor compounds. 
For example, the 'mustard cream pan sauce' recipe contains chicken broth, mustard, 
and cream, none of which share any flavor compounds (N S (R) = 0) in our dataset. 
Yet, N S (R) can reach as high as 60 for 'sweet and simple pork chops', a recipe 
containing apple, pork, and cheddar cheese {See Fig. 3A). To check whether recipes 
with high N S (R) are statistically preferred (implying the validity of the shared 
compound hypothesis) in a cuisine c with N c recipes, we calculate AN S — N* — iVj and , 
where 'real' and 'rand' indicates real recipes and randomly constructed recipes 
respectively and N s = S# N S (R)/N C (see SI for details of the randomization process). 
This random reference (null model) controls for the frequency of a particular 
ingredient in a given regional cuisine, hence our results are not affected by historical, 
geographical, and climate factors that determine ingredient availability (see SI SI. 1.2). 

Contribution. The contribution of each ingredient to the shared compound effect 
in a given cuisine c, quantifying to what degree its presence affects the magnitude of 
AN S , is defined by 



2fi Y#J)\Gnq 



(2) 



where fa represents the ingredient z"s number of occurrence. An ingredient's 
contribution is positive (negative) if it increases (decreases) AN S . 

Authenticity, we define the prevalence Pf of each ingredient * in a cuisine c as 
^i =n i/ where n\ is the number of recipes that contain the particular ingredient i 
in the cuisine and N c is the total number of recipes in the cuisine. The relative 
prevalence p^ — P- — (pf ^ measures the authenticity — the difference between the 
prevalence of i in cuisine c and the average prevalence of i in all other cuisines. We can 
also identify ingredient pairs or triplets that are overrepresented in a particular cuisine 

relative to other cuisines by defining the relative pair prevalences p c v = — / dp > 



and triplet prevalences = P^. - 



fN c and P' 



ijk~ 
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